Research Challenges in Ubiquitous Knowledge Discovery

نویسنده

  • Michael May
چکیده

Ubiquitous Knowledge Discovery is a new research area at the intersection of machine learning and data mining with mobile and distributed systems. In this paper the main characteristics of the objects of study are defined and a high-level framework for analyzing ubiquitous knowledge discovery systems is introduced. Next, a number of examples from a broad range of application areas are reviewed and analyzed in terms of this framework. Based on this material, important characteristics of this field are identified and a number of research challenges are discussed. Ubiquitous Knowledge Discovery Knowledge Discovery in ubiquitous environments (KDUbiq) is an emerging area of research at the intersection of the two major challenges of highly distributed and mobile systems and advanced knowledge discovery systems. Today, in many subfields of computer science and engineering, being intelligent and adaptive marks the difference between a system that works in a complex and changing environment and a system that does not work. Hence, projects across many areas, ranging from Web 2.0 to ubiquitous computing and robotics, aim to create systems which are “smart”, “intelligent”, “adaptive” etc., allowing to solve problems that could not be solved before. A central assumption of KDUbiq is that what seems to be a bewildering array of different methodologies and approaches for building smart, adaptive, intelligent systems, can be cast into a coherent, integrated set of key ideas centered on the notion of learning from experience. Focusing on these key ideas, KDUbiq provides a unifying framework for systematically investigating the mutual dependencies of otherwise quite unrelated technologies employed in building next-generation intelligent systems: machine learning, data mining, sensor networks, grids, P2P, data stream mining, activity recognition, Web 2.0, privacy, user modeling and others. Machine learning and data mining emerge as basic methodologies and indispensable building blocks for some of the most difficult computer science and engineering challenges of the next decade. From a high-level perspective, key characteristics of an ubiquitous knowledge discovery application are: C1. Time and space. The objects of analysis exist in time and space. Often they are able to move. C2. Dynamic environment. These objects might not be stable over the life-time of an application. Instead they might appear or disappear. They exist in a dynamic and unstable environment, evolving incrementally over time. C3. Information processing capability. The objects are endowed with information processing capabilities. C4. Locality. The objects never see the global picture, knowing only their local spatio-temporal environment. C5. Real-Time. Because they typically have to take decisions or even act upon their environment, analysis and inference has to be done in real-time, and not only on historic data; the models have to evolve incrementally in correspondence with the evolving environment. C6. Distributed. In many cases the object will be able to exchange information with other objects, thus forming a truly distributed environment. Objects to which these characteristics apply are humans, animals, and, increasingly, various kinds of computing devices. It is the latter, that form the objects of study for KDUbiq. For analyzing the different possible architectures of ubiquitous knowledge discovery systems within a highlevel framework, we introduce six dimensions of KDUbiq: 1. Application Area. 2. Ubiquitous Technologies. 3. Resource Aware Algorithms. 4. Ubiquitous Data Collection. 5. Privacy and Security. 6. HCI and User-Modeling. When designing a ubiquitous knowledge discovery system, major design decisions in each of these six dimensions have to be taken. These choices are mutually constraining each other and dependencies among them have to be carefully analyzed. KDUbiq thus adopts a systems view on how to build next generation knowledge discovery systems. Two important aspects to be ubiquity have to be distinguished, namely • the ubiquity of data, and • ubiquity of computing. In a prototypical application the ubiquity of the environment corresponds naturally to the ubiquity of the data – e.g. the spatio-temporally tagged data in case study 3 arise because the vehicles are moving, in case study 5 they arise because the collections are owned by different people. But there are borderline cases that are ubiquitous in one way but not in the other, e.g. clusters or grids for speeding up data analysis by distributing files and computations to various computers, or track mining from GPS data where the data a analyzed on a central server in an offline batch setting. To stimulate research and further define the field, the KDUbiq research network (www.kdubiq.org), funded by the European Commission, was launched in 2006. Currently it has more than 40 members. It is organized around working groups for each of the dimensions of KDUbiq. It has launched workshops series at KDD, ICDM, and PKDD and ECML/PKDD, including mining data streams from sensor data, on privacy-preserving data mining, on spatial data mining, on user modeling, on ubiquitous web mining. The general points that emerge from these activities are discussed in a joint book, currently under preparation, the KDUbiq “Blueprint on Ubiquitous Knowledge Discovery”. It aims for a comprehensive overview on the six design dimensions and the research agenda needed for implementing the KDUbiq vision. To provide a more specific description of the content of KDUbiq, in this document we analyze a number of case studies (in this extended abstract the descriptions have to be shortened). The following selection criteria have been used: (1) each case study focuses on a different domain; (2) it presents a challenging real-life problem; (3) there is a body of prior technical work addressing at least some of the six dimensions of ubiquitous knowledge discovery. Existing work is not necessarily done under the label of “Ubiquitous Knowledge Discovery”. The subject is new and draws on work scattered around many communities. For a review of earlier work on distributed data mining see [7]. Case Study 1: Autonomous driving vehicle The first case study provides an impressive example how machine learning can help to solve an important real world task: The DARPA grand challenge. The goal was to develop an autonomous robot capable of traversing unrehearsed road-terrain. The robot had to navigate a 228 km long course through the Mojave desert in no more than 10 hours. The challenge was won in 2005 by the robot Stanley, built by a Stanford-based team lead by Sebastian Thrun [17]. Modern vehicles fit the basic characteristics of ubiquitous knowledge discovery systems very well: they exist in a dynamic environment, moving in time and space, equipped with sensors, increasingly communicating with other devices, e.g. satellites, for navigation. What sets Stanley apart from traditional cars on the hardware side is the large number of additional sensors, computational power and actuators. Stanley uses machine learning for a number of learning tasks, both offline and online. An offline classification task solved with machine learning is obstacle detection, where a first order Markov model is used. A second online task is road finding: classifying images into drivable and non-drivable areas. An adaptive Mixture of Gaussians algorithm is used to model a distribution that changes over time. It would be impossible to train the system offline for all possible situations. Case Study 2: Activity recognition – inferring transportation routines from GPS-data The widespread use of GPS devices led to an explosive interest in this type of data. One emerging area is assistive technologies: A personal guidance system helping cognitively impaired persons to find their way through a complex transportation system. This application has been proposed by the project Opportunity Knocks [14][11]. The basic infrastructure is a mobile device equipped with GPS and connected to a server. An inference module running on the server is able learn a person’s transportation routines from the GPS data collected. It is able to give advice to persons, which route to take or where to get off a bus, and it can warn the user if he commits errors, e.g. takes the wrong bus line. Machine learning algorithms are used to infer likely routes, activities, transportation destinations and deviations from a normal route. It is an unsupervised learning task. The basic knowledge representation mechanism is a Dynamic Bayesian Network. In further work, Conditional Random Fields are used. Case Study 3: Intelligent Multi-Agent Systems Smart Home MavHome [3] is a project that aims to build an intelligent environment, a smart home, which is able to acquire and apply knowledge about its inhabitants and surroundings. A home is seen as a rational agent capable of perceiving the state of the home through sensors an acting upon the environment through effectors. MavHome uses a sensor network for perceiving light, humidity, smoke, gas (CO), motion, and door, window seat status sensors. Inhabitant localization is done using passive infrared sensors. The software architecture is based on CORBA for communication between agents. The system is based on combining multiple heterogeneous machine learning algorithms in order to identify repeatable behaviors (patterns), to predict inhabitant activity and to learn a control strategy. The information is used for automation and optimization of the conditions in the house. For detecting patterns a sequential pattern mining algorithm ED is used which minimizes description length, and processes data as they arrive, thus assuming a data stream setting. Behavior prediction is done via the ALZ algorithm, taking ideas from the well-known LZ78 text compression algorithm. The predictive performance on real-world data collected over a month, was 44% when ALZ was combined with ED. Case Study 4: Real-Time Vehicle Monitoring The Vehicle Data Stream Mining System VEDAS [6] is a mobile and distributed data stream mining application. It analyzes and monitors the continuous data stream generated by a vehicle. It is able to identify emerging patterns and reports them back to a remote control center over a low-bandwidth wireless network connection. Applications are real-time on-board health monitoring, drunk-driving detection, driver characterizations, and security related applications for commercial fleet management. VEDAS uses a PDA or other light weight mobile device installed in a vehicle. It is connected to the On Board Diagnostic System (OBD-II); other sensory input comes from a GPS device. Significant mining tasks are carried out on board, monitoring the state of transmission, engine and fuel systems. Only aggregated information is transmitted to a central server via a wireless connection. The data-mining has to be performed onboard using a streaming approach, since the amount of data that would have to be transmitted to the central server is too huge. The basic idea of the VEDAS data mining module is to provide distributed mining of multiple mobile data with little centralization. The data mining algorithms are designed around the following ideas: minimize data communication; minimize power-consumption; minimize onboard storage; minimize computing usage; respect privacy constraints. VEDAS implements incremental PCA, incremental Fourier transform, online linear segmentation, incremental k-means clustering and several lightweight statistical techniques. The basic ideas of these algorithms are of course well-known; the innovation lies in adapting to a resource-constrained environment, resulting in new approximate solutions. Case Study 5: Web2.0 – Music Mining With the advent of Web 2.0, collaborative structuring of large collections of multi-media data based on metadata and media features has become a significant task. Nemoz (NEtworked Media Organizer [12]) is a Web 2.0inspired collaborative platform for playing music, browsing, searching and sharing music collections. It works in a distributed scenario, a loosely coupled P2P domain. Nemoz has facilities for Web 2.0-style tagging, but also allows users to automatically classify their audiocollection using machine learning. Nemoz is motivated by the observation that a globally correct classification for audio files does not exist, since each user has its own way of structuring the files, reflecting his own preferences and needs. Still, a user can exploit labels provided by other peers as features for his own classification: the fact that Mary, who structures here collection along mood, classifies a song as “melancholic” might indicate to Bob, who classifies along genre, that it is not a Techno song. To support this, Nemoz nodes are able to exchange information about their individual classifications. These added labels are used in a predictive machine learning task. Thereby Nemoz introduces a new type of learning problem [18]: the collaborative representation problem. This application is a representative of a innovative subclass of applications in a Web 2.0 environment. Whereas most Web 2.0 tagging applications use a central server where all media data and tags are consolidated, the current application is fully distributed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter 2 Research Challenges in Ubiquitous Knowledge Discovery

Ubiquitous Knowledge Discovery is a new research area at the intersection of machine learning and data mining with mobile and distributed systems. In this paper the main characteristics of the objects of study are defined. Next, a number of examples from a broad range of application areas are reviewed and analyzed. Based on this material, important characteristics of this field are identified a...

متن کامل

Context, (e)Learning, and Knowledge Discovery for Web User Modelling: Common Research Themes and Challenges

“Context” has been a popular topic in recent work on interactive systems, in user modelling, knowledge discovery, and ubiquitous computing. It is commonly acknowledged that understanding context is vital for effectively interpreting and supporting users. Several contributions to this workshop explore the use of context for better understanding and/or supporting learning with electronic, network...

متن کامل

Ubiquitous Data

Ubiquitous knowledge discovery systems must be captured from many different perspectives. In earlier chapters, aspects like machine learning, underlying network technologies etc. were described. An essential component, which we shall discuss now, is still missing: Ubiquitous Data. While data themselves are a central part of the knowledge discovery process, in a ubiquitous setting new challenges...

متن کامل

Context gathering in Ubiquitous Environments: Enhanced Service Discovery

Delivering individualized services that conform to the user’s current situation will form the focus of ubiquitous environments. A description of the networked environment at a semantic level will necessitate contextually oriented knowledge acquisition methods. This then engenders unique challenges for the crucial step of resource discovery. A number of service discovery protocols exist to perfo...

متن کامل

Preface Organization Workshop Chairs

Ubiquity: the property or ability to be present everywhere or at several places at the same time " Ubiquitous computing " has become a highly popular topic of current developments in computer science and information technology. A special promise lies in environments that are not only " pervasively " populated by sensors and distributed, mobile and embedded devices, but also utilize the collecte...

متن کامل

GAS Ontology: An ontology for collaboration among ubiquitous computing devices

The vision of ubiquitous computing is that the addition of computation and communication abilities to the artifacts that surround people will enable the users to set up their living spaces in a way that will serve them best minimising at the same time the required human intervention. The ontologies can help us to address some key issues of ubiquitous computing environments such as knowledge rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007